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1 Introduction 



We consider the problem of estimating the CDF in contexts of independently and iden- 
tically distributed (iid) data and randomly right- censored data. Indeed, the seminal 
paper of Kaplan and Meier jll] solves this problem with the product-limit estimator — 
the nonparametric maximum likelihood estimator of the CDF — but there is still room 
for improvement, especially when the sample size is small. 

The most obvious drawback of the Kaplan- Meier estimator, like the empirical distri- 
bution function (EDF), is its lack of smoothness. Kernel smoothing easily remedies this 
problem, but also introduces two new issues of choosing the best kernel and bandwidth. 
Kernel smoothing also improves the estimator mean square error (MSE) performance 
by decreasing its variance while introducing a slight bias resulting in an overall improve- 
ment of the MSE. The MSE improvement, however, is typically only a second-order 
improvement, since the original estimator's first-order MSE convergence rate already 
achieves the best-possible y^-rate. When the asymptotic relative efficiency (ARE) be- 
tween the Kaplan-Meier estimator and its smoothed counterpart is one, as is typically 
the distinction in performance can be measured by considering the asymptotic 

relative deficiency, or just simply the deficiency between the two estimators. The gen- 
eral notion of deficiency and subsequent calculations with the proposed estimators is 
provided in Section [3] which also illustrates that an actual increase in efficiency can 
be achieved with the new estimators under certain (rather strong) assumptions of the 
distribution function. 

Higher-order MSE improvement is influenced by the kernel order — the higher the 
kernel order, the greater the improvement. Therefore the best kernel-based estima- 
tors, the ones with smallest asymptotic MSE, are the estimators that use infinite-order 
kernels. Current methods traditionally invoke second-order kernels [23l and more re- 
cently a hybrid kernel estimator has been investigated [13], but infinite-order kernel 
methods allow for the greatest improvement in bias rates without affecting the rates 
of the variance. The main argument against the use of large-order kernels in density 
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estimation is the concern that the estimator may be negative on some intervals when 
it is known that the true probabihty density is always nonnegative. This argument, 
however, is moot in the density estimation context (so also in the CDF estimation 
context) since the estimator can easily be truncated to zero when it goes negative 
then renormalized to have a total area of one without affecting the MSE convergence 
rate. General construction of the infinite-order kernel estimators are introduced in the 
following section and a compatible bandwidth selection algorithm that adapts to the 
infinite-order kernels is described in Section HI 

Another pitfall of all kernel estimators of the density is the lack of consistency at 
boundary points when the support of the density lies in an interval or half-interval. 
Simple reflection [25j solves this problem in the density estimation context and an 
analogous fix also exists for CDF estimators. Boundary correction and standardization 
methods specific to kernel-smoothed CDF estimators are discussed in Section [5} 

Simulations with iid and censored data illustrate the effectiveness of the infinite- 
order kernel estimators coupled with the automatic bandwidth selection algorithm 
of Section [6j Uniform improvement in MSE over existing estimators is observed in 
the simulations. Since estimation of the CDF is so fundamental in standard statical 
analysis, there are many applications of the new estimators beyond just estimating 
the underlying CDF. Some of these applications are included in the last section on 
Discussions and Conclusions. 

2 Estimation with Flat-Top Kernels 

The analysis will be confined to independently and identically distributed (iid) data, 
but extensions to randomly right censored with possible left truncation can be more 
generally derived; cf. [31 |2l] . 

Let Xi, . . . ,Xn be independent^ and identically distributed random vectors in M 
with absolutely continuous distribution function F and corresponding probability den- 
The independent assumption can be relaxed under certain stationarity and mixing conditions; see |16l [H]. 
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sity function /. Estimation of / with infinite-order kernels was considered in [2T] and 
[3] ; here we consider the integration of those estimators in the construction of the CDF 
estimator. 

The traditional estimator of the CDF is the empirical distribution function, or EDF, 
which is given by 

1 " 

where /(•) represents the indicator function. The kernel estimator of the probability 
density, /, is then given by 



fh{x) 



h 



where X is a kernel that integrates to one (but not necessarily nonnegative!) and h is 
the bandwidth parameter. To insure consistency of //i, h should satisfy the condition 
/i ^ as n — > oo but with nh oo. 

The smoothed estimation of the CDF, Fh, is constructed by integrating fh- That 

is, 

= j'^ h{x) dx=lp^K (^^) (1) 

where K{t) = K{x) dx. 

The estimator F}^{t) is equivalent to the EDF in terms of first-order asymptotic 
performance, but improvements are achieved in the higher-order terms. The estimator 
Fh{t) effectively smooths the EDF, decreasing its variance at the cost of introducing a 
slight bias. The variance improvement is uniform across different kernels, affecting only 
the second-order constant and not the second-order rate (refer to equation ([2]) below); 
however the additional bias that gets introduced in the smoothing can be minimized 
significantly by using kernels of large order with infinite-order kernels providing the 
most benefit. The variance of Fh{t)i as derived in [15^, is given by 



var 



The bandwidth parameter h only enters the variance expression through the second- 
order term which is negative. So the larger h is, the smaller the variance of -FX(t) 
becomes. However, we will see below in Theorem [T] that the smaller h is, the smaller 
the bias of Fh{t) becomes. Therefore there is an optimal h that strikes a compromise 
between the bias and variance terms which is presented in Corollary [T] below. 

We now construct a family of infinite-order kernels, following [21], that are derived 
from "flat-top functions". We start with a continuous, real-valued function k given by 



k{s) = < 



1, Isl < c 

(3) 

otherwise 



where g is any continuous, square-integrable function that is bounded in absolute 
value by one and satisfies g{\c\) = 1. The region |s| < c is referred to as the "flat- 
top neighborhood", but in some cases we may wish to relax the requirement to allow 
g{s) ~ 1 when s is close to c. This "effective flat-top neighborhood" is useful when using 
an infinitely smooth function as described in [19] and Section 6 below. The Fourier 
transform of k then produces the infinite-order kernel, K, of interest. Specifically, 



2tt 



oo 



K{x) = ^ I K{s)e~'''' ds. (4) 

-oo 



The MSE of Fh{t) with an infinite-order kernel K is now computed under various 
assumptions on the smoothness of the underlying density. Let (j){t) be the characteristic 
function corresponding to /(x), i.e. 

/oo 
f[x)e''^dx. 
-oo 

The following three assumptions quantifies the degree of smoothness of the density 
f{x) by the rate of decay of its characteristic function. 

Assumption A{p): There is a p > such that \t\P \4>{t)\ < oo. 

Assumption B: There are positive constants d and D such that < De^'^'*'. 



Assumption C: There is a positive constant b such that (f){t) = when |t| > b. 

Theorem 1. Let Fh{t) be a kernel smoothed estimator of the CDF with an infinite- 
order kernel derived from a flat-top function, 
(i) Suppose assumption A{p) holds, then 



sup 



bias(F;,(t)) =o{hP+'). 



(a) Suppose assumption B holds, then 



sup 



bias 



(Hi) Suppose assumption C holds. When h < 1/6, 



sup 



hmsiPhit) 



To optimize the amount of smoothing under the MSE criterion — i.e., to optimize 
the bandwidth h — we choose the bandwidth that allows the squared bias rates to be 
comparable to the second-order variance rates. The optimal bandwidths are provided 
in the following corollary. 

Corollary 1. Let Fh{t) be as in Theorem^ 
(i) Suppose assumption A{j)) holds. Letting h ~ an~^ where a is any positive con- 
stant and (3 = {2p + optimizes the tradeoff between the bias and variance of 
Fh{t) o-nd gives 



sup 



bias (Fhit) 



a I n 2p+i 



(a) Suppose assumption B holds. Letting h ~ a/logn where a < 2d is a constant 
optimizes the tradeoff between the bias and variance of Fh(t) and gives 



sup 



bias [Fh{t) 



1 



nlogn 
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(in) Suppose assumption C holds. Letting h < 1/b he fixed guarantees zero bias and 
the best possible variance rate. 

Estimation of the survival function with randomly right censored data can be sim- 
ilarly improved with the smoothing of the Kaplan-Meier estimator with infinite-order 
kernels. Density estimation of censored data with infinite-order kernels is analyzed in 
[3], and an estimator of the survival function can be similarly derived from this den- 
sity estimator through integration as in ([T]). The same conclusions as Theorem [l]and 
Corollary [T] will also hold for the smoothed version of the Kaplan-Meier estimator with 
infinite-order kernels. This is detailed in the following theorem where the proof has 
been omitted as it follows naturally from the iid case above. 

Define Sh{t) to be a smoothed estimator of the survival function, S{t) = 1 — F{t), 
derived from smoothing the Kaplan-Meier estimator with an infinite-order kernel of 
the form given in Q; i.e., 

5,(t) = j;.,i^(^^) (5) 

where Sj is the height of the jump of the Kaplan-Meier estimator at Xj (cf. [5^ for 
more details). The following theorem is consistent with the results described in |14) . 

Theorem 2. Let Sh{t) be a kernel smoothed estimator of the survival function as in 
^ above. Suppose assumption A{p) holds, then 



sup 



bias (Sh{t)] = (/jP+i) = o (n^iS^ 



when h an ^ where a is any positive constant and (3 = {2p + 1) ^ . 

The analysis under assumptions B and C of the above theorem are considerably 
more complex and have been omitted. 
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3 Deficiency 



The notion of deficiency was introduced in the article "Deficiency" by Hodges and 
Lehmann [TO] wherein several deficiency calculations are provided. Many articles fol- 
lowed suit using the deficiency concept to compare kernel- smoothed estimators, but 
many of the approaches used in calculating the deficiency strayed from the original 
and simple techniques employed by Hodges and Lehmann; c.f. HI El IH 13 |23l |26] . 
The simplicity of the original deficiency computations is maintained in the proof of 
Theorem |3] below. 

The deficiency concept is described as follows. Given an estimator, 5m, based on a 
sample of size m and a more efficient estimator, r„, based on a sample of size n with 
equivalent performance as Sm- The difference between the sample sizes, d = m — n, 
defines the relative deficiency between the two estimators. The original paper of Hodges 
and Lehmann mostly dealt with situations where d approaches a finite limit as n goes to 
infinity in which case the two estimators have an asymptotic relative efficiency (ARE) 
of one. However, it is still possible for two estimators to have an ARE of one yet 
with a deficiency that approaches infinity. Therefore calculation of the rate in which d 
approaches infinity gives a generalization of the original deficiency concept. 

In the following theorem, a formula is derived for computing the generalized defi- 
ciency between two estimators from their MSE performance which explicitly computes 
the rate at which d approaches infinity. 

Theorem 3. Suppose the mean squared errors of two estimators Sn and Tn are given 
as 



Define m = m{n) to he the sample size for which MSE{Tm) equals (up to a second order 
term) MSE(Sn)- Then the asymptotic deficiency ofTn relative to Sn is d = m — n and 
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satisfies 



d 



b — a 



n 



cr 



In the next theorem, the deficiency of two estimators is calculated when the second- 
order term in the MSE expansion decreases at the rate n'^logn which is very close to 
the leading term of rf . Therefore the deficient index, will approach infinity at a 
faster rate indicating a larger discrepancy in the performance of the two estimators. 

Theorem 4. Suppose the mean squared errors of two estimators Sn and Tn are given 
as 



Define m = m{n) to he the sample size for which MSE(Tm) equals (up to a second 
order term) MSE{Sn)- Then the asymptotic expected deficiency ofTn relative to Sn is 
d = m — n and satisfies 



These formulas, combined with the results of Corollary 1 and equation ([2]), are 
used to derive the deficiency of infinite-order kernel estimators to the unsmoothed EDF 
under the assumptions A{p), B, and C. In the case of assumption C, the improvement 
in MSE performance is first-order, and therefore improvement in terms of efficiency, or 
ARE, is present. 

Corollary 2. Let Fh{t) be as in Theorem^ and F be the empirical distribution function 
estimator. Assume F(t) (1 — F(t)) ^ 0. 
(i) Suppose assumption A{p) holds. When h ~ an~^ where a > is constant and 
13 = (2p + the deficiency of Ff^{t) relative to F{t) is 






9 



(a) Suppose assumption B holds. When h ~ a/logn where a < 2d is a constant, the 
deficiency of Fh{t) relative to F{t) is 

/ 2af{t) (/ uK{u)K{u) du) \ n 
\ F{t) (1 - F{t)) ) logn 

(Hi) Suppose assumption C holds. When h < 1/b is constant, the deficiency of Fh{t) 
relative to F{t) is 

2 fit) {JuK{u)K{u) du)\ 
Fit) {1- Fit)) ) 

4 Bandwidth Selection 

We now present a simple bandwidth selection algorithm that requires very minimal 
computation and adapts to the specialized family of infinite-order kernels that is utilized 
in this paper. The methods suggested in for iid data and in [3] for censored 
data present an algorithm that automatically selects the optimal bandwidth in density 
estimation. Remarkably, these same algorithms can also be used to select the best 
bandwidth in CDF estimation. Although the bias in estimating the CDF is smaller 
than the bias of the density estimators, the variance of the CDF estimator is also smaller 
than the variance of the density estimator. This algorithm automatically adapts to the 
appropriate assumption Aip), B, or C and generates a bandwidth that is consistent 
for the ideal bandwidth given by Corollary [T] The algorithm is also computationally 
light as well as being simple to describe, and we now proceed to describe it. 
Let (p be the natural estimate of the characteristic function given by 




In the context of censored data. Fix) in the above expression is replaced with the 
Kaplan-Meier estimator of the CDF. The main key to the algorithm is finding when 
(pit) ~ 0; more specifically, determining the smallest value t* such that (pit) ~ for all 
t £ it*,t* + e) for some pre-specified e. Then the estimate of the bandwidth is given 
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hy h = 1/t*. The formal algorithm is presented below. 
Bandwidth Selection Algorithm 

Let C > be a fixed constant, and e„ be a nondecreasing sequence of 
positive real numbers tending to infinity such that £n = o(logn). Let t* be 
the smallest number such that 



Then let h = c/t* where c is the "flat-top radius" depicted in equation ([3|. 

The positive constant C is irrelevant in the asymptotic theory, but is relevant for 
finite-sample calculations. The main idea behind the algorithm is to determine the 
smallest t such that (f)(t) ~ 0. In most cases this can be visually seen without explicitly 



5 Boundary Correction and Standardization 



Vanilla versions of the kernel estimators for density estimation break down when the 
support of the density is restricted to a subset of the real line. For instance, in esti- 
mating the probability density function of data taken from an exponential distribution, 
most kernel estimators give substantial area to negative values even when it is known 
that the support of the density is nonnegative. It is not too difficult to see that simple 
kernel estimators of the density will not be consistent at the boundary of the density's 
support; cf. [25]. However, a simple remedy by reflection works well when the support 
is not too complex. For instance when the support of the density is [a,oo), then the 
estimator 



is consistent at the boundary point a ([25]). 

This problem, therefore, also carries over to the situation of estimating the CDF. 
Indeed the EDF and Kaplan-Meier estimators do not suffer from this drawback, but 





(6) 



computing the threshold in ([g]). 




(7) 
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the kernel smoothed versions do. By integrating ([7|, we deduce a boundary-corrected 
version of the kernel-smoothed CDF estimator with the same formulation as ([7]) . For 

t G [a, oo), 

kit) = f hx) dx 

J —oo 

= £ (A(x) + A(2a - x)) dx 
= hit) - Fh{a) + [ fh{x) dx 

J2a~t 

= Fhit) - Fh{a) + Fh{a) - Fh{2a - t) 
= Fh{t) - Fh{2a - t) 

and Fh{t) = when t < a. In the special case a = 0, we have the simple formula 

hit) = (Fft(t)-F^-t))l[o,oo)W 

There is an additional issue that only affects higher-order kernel estimators and not 
second-order estimators. Specifically, higher-order kernel estimators of the density are 
not necessarily nonnegative, which means higher-order kernels estimators of the CDF 
are not necessarily contained within the range [0, 1] or forced to be nondecreasing. The 
natural remedy for these density estimators is to truncate negative estimates to zero 
and then renormalize the area to one. When this is performed, the corresponding CDF 
estimator will be a valid CDF. However this approach causes the kernel estimator of 
the CDF to lose its simplistic representation that is given in the right-hand side of 
([T]), so instead, a simple alternative standardization technique is suggested. To insure 
the estimator is nondecreasing, Fhit) is replaced by sup^.Q^^^) Fh{t), and to insure the 
range is between and 1, max(F/j(t), 1) and min(F/j(t), 0) are invoked. 

Replacing Fh{t) with sup(„oQ_() Fhit) is equivalent to replacing the estimator of the 
density fh{x) with the truncated version f^{x) = max(//i(x), 0) and then integrating 
the truncated density estimator from — oo to t. Since f}^{x) has better MSE perfor- 
mance than the nontruncated counterpart fh{x) |22| . it follows that the nondecreasing 
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estimator sup ^ Fh{t) has better MSE performance than the original Fh{t)- Sim- 
ilarly, the MSE of the range restricted estimator produced from max{Ffi{t) , 1) and 
mm{Ffi{t),0) will also be improved since it is known the CDF has a range bounded in 
[0,1]. This is formalized in the following corollary. 

Corollary 3. Let Fh{t) he as in Theorem 1. A modified estimator is defined as 
Fh{t) = max min sup Fh{t),0 , 1 • 

V V(— .*] / / 

Then it follows that 

MSE (^Fhit)'^ < MSE 
and Fh{t) satisfies the necessary properties of a CDF. 



6 Simulations 

We evaluate the performance of the proposed infinite-order kernel estimators with the 
more traditional second-order kernel estimators and the EDF/Kaplan-Meier estimator. 
Boundary correction, as described in Section |5] is applied to the estimators when 
appropriate. As any choice of function g{x) in ^ will insure the ideal asymptotics of 
an infinite-order kernel, the selection of infinite-order kernels is quite large. An easy 
choice for the function g{x) is the straight line truncated at zero, i.e. g{x) = (y5f )^, 
which gives k a trapezoidal shape. The simulations below considers this trapezoidal 
function k with c = .75. 

By making the flat-top function k{x) infinitely smooth, the resulting kernel via the 
Fourier transform will have tails that decay exponentially. Therefore in situations in 
estimating the density with boundary conditions, the kernel derived from the infinitely 
smooth flat-top function is more close to having the desirable quality of being compactly 
supported than the kernel which is derived from the trapezoidal function. One example 
of an infinitely smooth n(x) is |17] 
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k{s) 



exp 







(N-i)" 



if |s| < c 

if c < |x| < 1 

if Ixl > 1 



(8) 



which resembles and infinitely smooth trapezoid and is controlled by the two pa- 
rameters b and c. In the simulations, we also used this function k for comparisons with 
the parameters b = 1 and c = .05. A plot of this k is given below. 
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Figure 1: Infinitely differentiable flat-top function (|8 
with parameters 6 = 1 and c = .05. 



This function is perfectly flat only from to .05, but it is "effectively" flat from 
to about .5. Therefore the effective flat-top radius is taken to be .5, and it is this value 
that is used in the bandwidth selection algorithm described above in Section 4. 

A slightly modified bandwidth selection algorithm was invoked that retains the 
function of the bandwidth algorithm described above. The key in the bandwidth 
algorithm is to find the smallest value of t* so that </>(t*) ~ 0. To automate this 
procedure, the value t* was chosen to be the first value for which (j){t*) starts to level 
off. 

A Gaussian kernel is used in the second-order kernel estimator, and cross validation, 
as suggested in 0], is used to select the bandwidth for this estimator. Estimates were 
simulated over 1000 realizations. 

The first simulation study considers the estimation of a A^(0, 1) CDF from iid data. 
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One may imagine the second-order Gaussian kernel estimator to do quite well in this 
context, but in fact the infinite-order kernel performs uniformly better. MSE estimates 
are provided at three points (t = —1.5,0,1.5) and under two different sample sizes 
(n = 15,30). 



Table 1: Comparison of the EDF with a Gaussian kernel 
estimator and two infinite-order kernel estimators (trapezoid 
and smoothed trapezoid) on iid normal data 





t = ■ 


-1.5 


t = 





t = 


1.5 


n 


15 


30 


15 


30 


15 


30 


MSEedf' 


4.30 


2.09 


16.29 


8.73 


4.42 


2.14 


MSE Gauss 


3.50 


1.75 


13.02 


7.20 


3.67 


1.82 


MSEtrap* 


2.85 


1.48 


11.72 


6.49 


2.93 


1.63 


MSEsmooth 


2.95 


1.55 


12.01 


6.71 


3.06 


1.69 



MSE values are blown up by 10^ for easier comparison. 



The second simulation study considers the estimation of a Weibull distribution 
with censored data. Lifetime variables, the variables of interest, are simulated from a 
Weibull distribution with shape parameter 3 and scale parameter 1.5 and the censoring 
variables are independently drawn from a Weibull distribution with shape parameter 4 
and scale parameter 3. Since the support of the lifetime density is on the positive real 
line, the boundary correction of Section [7] is implemented. MSE estimates are provided 
at three points {t = .75,1.5,1.5) and under two different sample sizes (n = 15,30). 
Here again the infinite-order kernels consistently outperform the second-order kernel 
estimator and the Kaplain- Meier estimator in term of MSE performance. In particular, 
the smoothed trapezoid is shown to perform well near the boundary point which can 
be attributed to its exponential tails making it more compactly supported. 
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0.0 0.5 1.0 1.5 2.0 2.5 3.0 

Figure 2: Lifetime and censored WeibuU densities considered in the simulations with a plot 
of the survival function also included. 



Table 2: Comparison of the EDF with a Gaussian kernel es- 
timator and two infinite-order kernel estimators (trapezoid 
and smoothed trapezoid) on censored WeibuU data 





t = 


.75 


t = 


1.25 


t = 


1.75 


n 


15 


30 


15 


30 


15 


30 


MSEedf* 


6.47 


3.51 


17.0 


7.75 


12.0 


5.62 


MSEcauss 


5.45 


2.84 


10.1 


5.27 


8.56 


4.11 


MSEtrap* 


5.83 


2.70 


8.68 


4.28 


9.32 


4.06 


MSEsmooth 


5.04 


2.36 


9.81 


4.85 


8.84 


5.62 



MSE values are blown up by 10"^ for easier comparison. 
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7 Discussion and Conclusions 



The proposed estimators have imphcations far beyond just providing a more accurate 
estimators of the CDF and survival function. For instance, it is standard practice to 
compare the effects of two drugs based on their respected survival functions, but the 
cost of running clinical trials limits the sample size of the available data. From the 
deficiency calculations of Section [3j we see that the proposed estimators can produce 
the same results as the traditional Kaplan-Meier estimator yet with a significantly 
smaller sample size. 

Another very standard use of the EDF is found in the bootstrap method. In the 
smoothed bootstrap, data is drawn from a smoothed EDF, and when the estimator 
of the smoothed EDF is improved, the smoothed bootstrap is also improved to give 
more accurate inferences [9l [18] . The bootstrap method is particularly beneficial when 
sample sizes are small, and therefore invoking infinite-order kernel estimators in this 
situation is often very natural. 

Hazard function estimation on small samples can also be significantly be improved. 
Hazard estimators, constructed from dividing a smoothed density estimate by a smoothed 
survival function, as in [17, have performance that is typically dictated by the conver- 
gence of the density estimator |3]. However in small sample sizes, accurate estimation 
of the survival function is just as crucial as accurate estimation of the density. 

The new infinite-order kernel estimators of the CDF and survival function is shown 
through analysis and demonstrated through simulations to be more accurate than the 
EDF and Kaplain-Meier estimators with significant improvements seen in small sample 
sizes and data from a distribution that has a rapidly decaying characteristic function. 
Significant improvements in terms of an increase in efficiency is also produced by the 
new estimators when the characteristic function of the data is identically zero after 
some finite value. Additionally, the bandwidth selection algorithm that accompanies 
the new estimator is computationally simpler with faster convergence rates than the 
cross-validation bandwidth selection algorithms used with finite-order kernels. 
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A Technical Proofs 



Proof of Theorem 1. 

From the following computation 



E 



Fh{t) 



EE 



K 



t-Xj 
h 



computing the bias of Ffi{t) amounts to computing the bias of K y^-^j- Starting 
with its expectation, we have 



E 



K 



t-Xi 



k( 1 fix) dx 



-co 
oo 



K 



h 

t — X 

h 



dF{x) 



K{ — ] F{x) 



lfFix)K(t^] dx. 



If we define Kh{t) = j^K (|-), then the expectation above can be written in very 
simply as 



E 



K 



t-x, 

h 



where * denotes convolution. 

To proceed, we will employ Fourier transform theory on (mathematical) distribu- 
tions, otherwise known as generalized functions. By invoking generalized functions, 
we can compute the Fourier transform of not just the standard class of integrable 
functions, but also many non-integrable functions like constants and cumulative distri- 
bution functions. This theory, in general, is very technical and readers unfamiliar with 
the subject are referred to [2] for a nice treatment of the subject. 

As K is the Fourier transform of k, k is therefore the inverse Fourier transform of 
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K. Through a simple change of variables, we have 



K{th) 



where the notation J- and J-~ will represent the Fourier transform and its inverse. 

Next we wish to derive the Fourier transform of the CDF F{t). This is the first 
generalized function that we encounter and its Fourier transform involves the Dirac 
delta function, S{s). Using the Heaviside step function H{x) given by H{x) = l{x > 0), 
we rewrite F{t) as 



Therefore the Fourier transform of F{t) reduces to the product of the Fourier transforms 
of f{x) and H{x); i.e. 





= 7r0(O)(5(s) + 



IS 



■S{s) + ^ 
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We will now proceed with estimating the bias of Fh{t). 



bias (^Fh{t)) =KhirF{t)-F{t) 

= J' {j'-\Khi< F{t) - F{t))) 

= T [T-^ (Kh) ■ (F) - (F)) 

= j-{{j-~'{k,)-i):f-'{f)) 



= {K{sh) - 1) TTd{s) + 



4>{s] 



IS 



= T \{k{sK) - 1) 
= T[{k{sK) - 1) 



IS 

M 

is 



-7rJ^{{K{sh) - l)5{s)) 
-7rJ"( -1) 



s=0 



J\s\>l/h 



cPis) 



IS 



ds. 



The last equality comes from the fiat- top property of k function; specifically, K{sh) = 1 
for \sh\ < 1 implies k(s/i) — 1 = for |s| < 1/h. Since k is bounded by one, we have 
the following bound on the bias of Fh{t), 



bias(F,w)|<|-/ 



ds. 



'\s\>l/h |S| 

We now bound the bias under the three assumptions A{p), B, and C. Under assump- 
tion A{p), we have 



ds 



I 

J\s\>ll 



ds 



\s\>l/h l«l J\s\>l/h Isl^"*"^ 

< hP+^ [ \s\P\(f>{s)\ds 
J\s\>l/h 

= o{hP+'^). 



(9) 
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Under assumption B, 



I 

J\s\>l 



\s\>l/h 



ds<h \(i){s)\ds 

J\s\>l/h 

<h [ De-'^l'l ds 

J\s\>l/h 

Dh 



< 



e"'- ./\s\>l/h 
O ( he-'^l^\ . 



(10) 



And under assumption C, 



/ 

J\s\>l 



ds = 



\s\>l/h 



when h <\. Therefore parts (i) through (iii) are proven from equations ([9]), (10), and 



(11) respectively. 

Proof of Theorem 3. 

If the mean square errors are equal, up to a fraction of the sample size, then we 



have 



which implies 



c a I 1 



n' 



c h f I 

+ —TTT+O 



m 



r+6 



n' 



c + 



a + o{l) 



n" 



m' 



c + 



b + o(l) 



Dividing through by c and solving for ^ gives 



m 
n 



1 + 



6 + 0(1) 



an" 



l/r 



1 + 



a + o(l) 



cm" 



-l/r 



From the above expression, we see that m/n 1 and therefore o(l/n) = o(l/m). 
Using the approximation (1 + x)^ = 1 + sx + O(x^) gives 



m b 
— = 1 + — 
n cm" crm 



a / 1 

!^ A + \ ^ 
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Recalling m = n + d, we have 



n 



d 



cm' 



b 



crm' 



a 



) 



Multiplying both sides of the above equation by n gives 



d 



b 




+ o{l) 



b — a 



n 



l-<5 



cr 



cr 



Proof of Theorem 4. 

The proof of Theorem |4] follows the same lines as the proof of Theorem [s] with 
replaced with logn. 
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